Introduction¶

In this notebook we'll learn how to use NumPy to work with numerical data.

Import Statements¶

In [1]:
import numpy as np

Understanding NumPy's ndarray¶

The crown jewel of NumPy is the ndarray. The ndarray is a homogeneous n-dimensional array object. What does that mean?

A Python List or a Pandas DataFrame can contain a mix of strings, numbers, or objects (i.e., a mix of different types). Homogenous means all the data have to have the same data type, for example all floating-point numbers.

And n-dimensional means that we can work with everything from a single column (1-dimensional) to the matrix (2-dimensional) to a bunch of matrices stacked on top of each other (n-dimensional).

1-Dimensional Arrays (Vectors)¶

In [2]:
my_array = np.array([1.1, 9.2, 8.1, 4.7])

We can see my_array is 1 dimensional by looking at its shape

In [3]:
my_array.shape
Out[3]:
(4,)

We access an element in a ndarray similar to how we work with a Python List, namely by that element's index:

In [4]:
my_array[2]
Out[4]:
8.1

Let’s check the dimensions of my_array with the ndim attribute:

In [5]:
my_array.ndim
Out[5]:
1

2-Dimensional Arrays (Matrices)¶

In [6]:
array_2d = np.array([[1, 2, 3, 9], 
                     [5, 6, 7, 8]])

Note we have two pairs of square brackets. This array has 2 rows and 4 columns. NumPy refers to the dimensions as axes, so the first axis has length 2 and the second axis has length 4.

In [7]:
print(f'array_2d has {array_2d.ndim} dimensions')
print(f'Its shape is {array_2d.shape}')
print(f'It has {array_2d.shape[0]} rows and {array_2d.shape[1]} columns')
print(array_2d)
array_2d has 2 dimensions
Its shape is (2, 4)
It has 2 rows and 4 columns
[[1 2 3 9]
 [5 6 7 8]]

To access a particular value, you have to provide an index for each dimension. We have two dimensions, so we need to provide an index for the row and for the column. Here’s how to access the 3rd value in the 2nd row:

In [8]:
array_2d[1,2]
Out[8]:
7

To access an entire row and all the values therein, you can use the : operator just like you would do with a Python List. Here’s the entire first row:

In [9]:
array_2d[0, :]
Out[9]:
array([1, 2, 3, 9])
In [10]:
array_2d[1, 2:4]
Out[10]:
array([7, 8])

N-Dimensional Arrays (Tensors)¶

An array of 3 dimensions (or higher) is often referred to as a ”tensor”. Yes, that’s also where Tensorflow, the popular machine learning tool, gets its name. A tensor simply refers to an n-dimensional array. Using what you've learned about 1- and 2-dimensional arrays, can you apply the same techniques to tackle a more complex array?

Challenge:

  • How many dimensions does the array below have?
  • What is its shape (i.e., how many elements are along each axis)?
  • Try to access the value 18 in the last line of code.
  • Try to retrieve a 1 dimensional vector with the values [97, 0, 27, 18]
  • Try to retrieve a (3,2) matrix with the values [[ 0, 4], [ 7, 5], [ 5, 97]]

Hint: You can use the : operator just as with Python Lists.

In [11]:
mystery_array = np.array([[[0, 1, 2, 3],
                           [4, 5, 6, 7]],
                        
                         [[7, 86, 6, 98],
                          [5, 1, 0, 4]],
                          
                          [[5, 36, 32, 48],
                           [97, 0, 27, 18]]])

# Note all the square brackets!
In [12]:
print(f"The array has {mystery_array.ndim} dimensions.")
The array has 3 dimensions.
In [13]:
print(f'Its shape is {mystery_array.shape}')
print(f'It has {mystery_array.shape[0]} elements in the first axis, {mystery_array.shape[1]} elements in the 2nd axis and {mystery_array.shape[2]} elements in the 3rd axis.')
Its shape is (3, 2, 4)
It has 3 elements in the first axis, 2 elements in the 2nd axis and 4 elements in the 3rd axis.

The shape is (3, 2, 4), so we have 3 elements along axis #0, 2 elements along axis #1 and 4 elements along axis #3.

In [14]:
mystery_array[2, 1, 3]
Out[14]:
18
In [15]:
mystery_array[2, 1]
# mystery_array[2, 1, :]
Out[15]:
array([97,  0, 27, 18])
In [16]:
mystery_array[:, :, :1]
Out[16]:
array([[[ 0],
        [ 4]],

       [[ 7],
        [ 5]],

       [[ 5],
        [97]]])
In [17]:
mystery_array[:, :, 0]
Out[17]:
array([[ 0,  4],
       [ 7,  5],
       [ 5, 97]])

NumPy Mini-Challenges¶

Challenge 1: Use .arange()to createa a vector a with values ranging from 10 to 29. You should get this:¶

print(a)

[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]

In [18]:
a = np.arange(10, 30)
a
Out[18]:
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29])

Challenge 2: Use Python slicing techniques on a to:¶

  • Create an array containing only the last 3 values of a
  • Create a subset with only the 4th, 5th, and 6th values
  • Create a subset of a containing all the values except for the first 12 (i.e., [22, 23, 24, 25, 26, 27, 28, 29])
  • Create a subset that only contains the even numbers (i.e, every second number)
In [19]:
a[-3:]
Out[19]:
array([27, 28, 29])
In [20]:
a[3:6]
Out[20]:
array([13, 14, 15])
In [21]:
a[12:]
Out[21]:
array([22, 23, 24, 25, 26, 27, 28, 29])
In [22]:
a[::2]
Out[22]:
array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

Challenge 3:Reverse the order of the values in a, so that the first element comes last:¶

[29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]

If you need a hint, you can check out this part of the NumPy beginner's guide

In [23]:
a[::-1]
Out[23]:
array([29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13,
       12, 11, 10])
In [24]:
np.flip(a)
Out[24]:
array([29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13,
       12, 11, 10])

Challenge 4: Print out all the indices of the non-zero elements in this array: [6,0,9,0,0,5,0]¶

In [25]:
b = np.array([6,0,9,0,0,5,0])
for i,v in enumerate(b):
    if v != 0:
        print(f"{i}, {v}")
0, 6
2, 9
5, 5
In [26]:
indices = [index for index, value in enumerate(b) if value!=0]
indices
Out[26]:
[0, 2, 5]
In [27]:
np.nonzero(b)
Out[27]:
(array([0, 2, 5], dtype=int64),)
In [28]:
type(np.nonzero(b)) # .nonzero() returns a Tuple
Out[28]:
tuple

Challenge 5: Use NumPy to generate a 3x3x3 array with random numbers¶

Hint: Use the .random() function

In [29]:
# legacy version
from numpy import random
np.random.random((3,3,3))
Out[29]:
array([[[0.78048386, 0.24425197, 0.37207726],
        [0.2994375 , 0.66881466, 0.04500482],
        [0.75640622, 0.61570753, 0.95039286]],

       [[0.76900265, 0.07279341, 0.83870573],
        [0.74451673, 0.72846878, 0.51726528],
        [0.14515017, 0.50006444, 0.42489008]],

       [[0.01759802, 0.15265102, 0.6726701 ],
        [0.15849926, 0.86739723, 0.49781934],
        [0.36518408, 0.98851795, 0.99928789]]])
In [30]:
# new version
from numpy.random import default_rng
rng = default_rng()
rng.random((3,3,3))
Out[30]:
array([[[0.00891402, 0.45450988, 0.69254832],
        [0.79143745, 0.49159704, 0.51504037],
        [0.22352877, 0.36219485, 0.3731767 ]],

       [[0.24402422, 0.55758883, 0.17159972],
        [0.0478456 , 0.85032138, 0.04643565],
        [0.97250857, 0.96028891, 0.6114276 ]],

       [[0.064536  , 0.54536048, 0.37821912],
        [0.32100515, 0.06392893, 0.96831539],
        [0.69015721, 0.2743039 , 0.39261461]]])

Challenge 6: Use .linspace() to create a vector x of size 9 with values spaced out evenly between 0 to 100 (both included).¶

In [31]:
x = np.linspace(start=0, stop=100, num=9)
x
Out[31]:
array([  0. ,  12.5,  25. ,  37.5,  50. ,  62.5,  75. ,  87.5, 100. ])

Challenge 7: Use .linspace() to create another vector y of size 9 with values between -3 to 3 (both included). Then plot x and y on a line chart using Matplotlib.¶

In [32]:
y = np.linspace(-3, 3, 9)
y
Out[32]:
array([-3.  , -2.25, -1.5 , -0.75,  0.  ,  0.75,  1.5 ,  2.25,  3.  ])
In [33]:
import matplotlib.pyplot as plt
plt.plot(x, y)
Out[33]:
[<matplotlib.lines.Line2D at 0x10ab7940>]

Challenge 8: Use NumPy to generate an array called noise with shape 128x128x3 that has random values. Then use Matplotlib's .imshow() to display the array as an image.¶

When you have a 3-dimensional array with values between 0 and 1, we can use Matplotlib to interpret these values as the red-green-blue (RGB) values for a pixel.

In [34]:
noise = rng.random((128,128,3))
plt.imshow(noise)
Out[34]:
<matplotlib.image.AxesImage at 0x10aff490>

Linear Algebra with Vectors¶

NumPy is designed to do math (and do it well!). This means that NumPy will treat vectors, matrices and tensors in a way that a mathematician would expect. For example, if you had two vectors:

In [35]:
v1 = np.array([4, 5, 2, 7])
v2 = np.array([2, 1, 3, 3])

And you add them together

In [36]:
v1 + v2
Out[36]:
array([ 6,  6,  5, 10])

The result is a ndarray where all the elements have been added together.

In contrast, if we had two Python Lists

In [37]:
# Python Lists vs ndarrays
list1 = [4, 5, 2, 7]
list2 = [2, 1, 3, 3]

adding them together would just concatenate the lists.

In [38]:
list1 + list2
Out[38]:
[4, 5, 2, 7, 2, 1, 3, 3]

Multiplying the two vectors together also results in an element by element operation:

In [39]:
v1 * v2
Out[39]:
array([ 8,  5,  6, 21])

Broadcasting and Scalars¶

Now, oftentimes you'll want to do some sort of operation between an array and a single number. In mathematics, this single number is often called a scalar. For example, you might want to multiply every value in your NumPy array by 2:

In order to achieve this result, NumPy will make the shape of the smaller array - our scalar - compatible with the larger array. This is what the documentation refers to when it mentions the term "broadcasting".

The same rules about 'expanding' the smaller ndarray hold true for 2 or more dimensions. We can see this with a 2-Dimensional Array:

In [40]:
array_2d = np.array([[1, 2, 3, 4], 
                     [5, 6, 7, 8]])
In [41]:
array_2d + 10
Out[41]:
array([[11, 12, 13, 14],
       [15, 16, 17, 18]])
In [42]:
array_2d * 5
Out[42]:
array([[ 5, 10, 15, 20],
       [25, 30, 35, 40]])

Matrix Multiplication with @ and .matmul()¶

In [43]:
a1 = np.array([[1, 3],
               [0, 1],
               [6, 2],
               [9, 7]])

b1 = np.array([[4, 1, 3],
               [5, 8, 5]])

print(f'{a1.shape}: a has {a1.shape[0]} rows and {a1.shape[1]} columns.')
print(f'{b1.shape}: b has {b1.shape[0]} rows and {b1.shape[1]} columns.')
print('Dimensions of result: (4x2)*(2x3)=(4x3)')
(4, 2): a has 4 rows and 2 columns.
(2, 3): b has 2 rows and 3 columns.
Dimensions of result: (4x2)*(2x3)=(4x3)

Challenge: Let's multiply a1 with b1. Looking at the wikipedia example above, work out the values for c12 and c33 on paper. Then use the .matmul() function or the @ operator to check your work.

See p. 172-173 notebook

In [44]:
np.matmul(a1, b1)
Out[44]:
array([[19, 25, 18],
       [ 5,  8,  5],
       [34, 22, 28],
       [71, 65, 62]])
In [45]:
a1 @ b1
Out[45]:
array([[19, 25, 18],
       [ 5,  8,  5],
       [34, 22, 28],
       [71, 65, 62]])

Manipulating Images as ndarrays¶

Images are nothing other than a collection of pixels. And each pixel is nothing other than value for a colour. And any colour can be represented as a combination of red, green, and blue (RGB).

In [46]:
from scipy import misc # contains an image of a racoon!
In [47]:
img = misc.face()
plt.imshow(img)
Out[47]:
<matplotlib.image.AxesImage at 0x13112d90>

Challenge: What is the data type of img? Also, what is the shape of img and how many dimensions does it have? What is the resolution of the image?

In [48]:
type(img)
Out[48]:
numpy.ndarray
In [49]:
img.shape
Out[49]:
(768, 1024, 3)
In [50]:
img.ndim
Out[50]:
3

"img" is an array of 3 dimensions, and has a resolution of 768x1024 pixels.

Let us question the nature of our reality and take a look under the surface. Here's what our "image" actually looks like:

In [51]:
img
Out[51]:
array([[[121, 112, 131],
        [138, 129, 148],
        [153, 144, 165],
        ...,
        [119, 126,  74],
        [131, 136,  82],
        [139, 144,  90]],

       [[ 89,  82, 100],
        [110, 103, 121],
        [130, 122, 143],
        ...,
        [118, 125,  71],
        [134, 141,  87],
        [146, 153,  99]],

       [[ 73,  66,  84],
        [ 94,  87, 105],
        [115, 108, 126],
        ...,
        [117, 126,  71],
        [133, 142,  87],
        [144, 153,  98]],

       ...,

       [[ 87, 106,  76],
        [ 94, 110,  81],
        [107, 124,  92],
        ...,
        [120, 158,  97],
        [119, 157,  96],
        [119, 158,  95]],

       [[ 85, 101,  72],
        [ 95, 111,  82],
        [112, 127,  96],
        ...,
        [121, 157,  96],
        [120, 156,  94],
        [120, 156,  94]],

       [[ 85, 101,  74],
        [ 97, 113,  84],
        [111, 126,  97],
        ...,
        [120, 156,  95],
        [119, 155,  93],
        [118, 154,  92]]], dtype=uint8)

There are three matrices stacked on top of each other - one for the red values, one for the green values and one for the blue values. Each matrix has a 768 rows and 1024 columns, which makes sense since 768x1024 is the resolution of the image.

Challenge:

Now can you try and convert the image to black and white? All you need need to do is use a formula_conversion_to_grayscale).

Y_linear is what we're after - our black and white image. However, this formula only works if our red, green and blue values are between 0 and 1 - namely in sRGB format. Currently the values in our img range from 0 to 255. So:

  • Divide all the values by 255 to convert them to sRGB, where all the values are between 0 and 1.
  • Next, multiply the sRGB array by the grey_vals to convert the image to grey scale.
  • Finally use Matplotlib's .imshow() together with the colormap parameter set to gray cmap=gray to look at the results.
In [52]:
grey_vals = np.array([0.2126, 0.7152, 0.0722])
In [53]:
sRGB_array = img / 255
img_grey = sRGB_array @ grey_vals
In [54]:
plt.imshow(img_grey, cmap='gray') # 'grey' doesn't work
Out[54]:
<matplotlib.image.AxesImage at 0x1661f340>

Challenge: Can you manipulate the images by doing some operations on the underlying ndarrays? See if you can change the values in the ndarray so that:

1) You flip the grayscale image upside down

2) Rotate the colour image

3) Invert (i.e., solarize) the colour image. To do this you need to converting all the pixels to their "opposite" value, so black (0) becomes white (255).

Challenge Solutions¶

In [55]:
flip_img = np.flip(img_grey)
plt.imshow(flip_img, cmap='gray')
Out[55]:
<matplotlib.image.AxesImage at 0x1685e5b0>
In [56]:
rot90_img = np.rot90(img)
plt.imshow(rot90_img)
Out[56]:
<matplotlib.image.AxesImage at 0x168d6ee0>
In [57]:
solar_img = 255 - img
# solar_img = np.invert(img)  # also works
plt.imshow(solar_img)
Out[57]:
<matplotlib.image.AxesImage at 0x16945b20>

Use your Own Image!¶

I've provided a .jpg file in the starting .zip file, so you can try your code out with an image that isn't a racoon. The key is that your image should have 3 channels (red-green-blue). If you use a .png file with 4 channels there are additional pre-processing steps involved to replicate what we're doing here.

In [58]:
file_name = 'yummy_macarons.jpg'

Use PIL to open¶

In [59]:
from PIL import Image # for reading image files
my_img = Image.open(file_name)
img_array = np.array(my_img)
img_array
Out[59]:
array([[[ 66,  47,  30],
        [ 67,  48,  31],
        [ 68,  49,  32],
        ...,
        [ 75,  61,  48],
        [ 76,  62,  49],
        [ 76,  62,  49]],

       [[ 66,  48,  28],
        [ 66,  48,  28],
        [ 68,  50,  30],
        ...,
        [ 65,  51,  40],
        [ 65,  51,  40],
        [ 65,  51,  40]],

       [[ 66,  48,  28],
        [ 65,  47,  27],
        [ 67,  49,  29],
        ...,
        [ 67,  53,  44],
        [ 66,  52,  43],
        [ 66,  52,  43]],

       ...,

       [[174, 125,  56],
        [175, 126,  59],
        [176, 128,  64],
        ...,
        [194, 147,  95],
        [196, 148,  99],
        [197, 149, 100]],

       [[175, 126,  57],
        [176, 127,  60],
        [177, 129,  65],
        ...,
        [194, 147,  95],
        [196, 148,  99],
        [197, 149, 100]],

       [[175, 126,  57],
        [176, 127,  60],
        [177, 129,  65],
        ...,
        [193, 146,  94],
        [196, 148,  99],
        [197, 149, 100]]], dtype=uint8)
In [60]:
img_array.ndim
Out[60]:
3
In [61]:
img_array.shape
Out[61]:
(533, 799, 3)
In [62]:
plt.imshow(255 - img_array)
Out[62]:
<matplotlib.image.AxesImage at 0x16bfeee0>
In [63]:
plt.imshow(np.rot90(img_array))
Out[63]:
<matplotlib.image.AxesImage at 0x1d5b5370>
In [64]:
plt.imshow((img_array / 255)@grey_vals, cmap='gray')
Out[64]:
<matplotlib.image.AxesImage at 0x2083ab20>
In [65]:
plt.imshow(np.flip(img_array))
Out[65]:
<matplotlib.image.AxesImage at 0x16bbf0d0>
In [66]:
# Different method for flipping the image without flipping the dimensions (red channel has all the values that were prev. blue)
plt.imshow(img_array[::-1, ::-1, :])
Out[66]:
<matplotlib.image.AxesImage at 0x1fdb6c70>
In [67]:
# This also works
plt.imshow(img_array[::-1])
Out[67]:
<matplotlib.image.AxesImage at 0x21f694f0>
In [68]:
# or using the flipup method:
plt.imshow(np.flipud(img_array)) # # flip vertically, 'ud' means Up and Down
Out[68]:
<matplotlib.image.AxesImage at 0x221284c0>
In [69]:
# If you want to use .flip() function, pls add a second argument 'axis=0'
plt.imshow(np.flip(img_array, axis=0))
Out[69]:
<matplotlib.image.AxesImage at 0x222aa850>

Learning Points & Summary¶

In this lesson we looked at how to:

  • Create arrays manually with np.array()

  • Generate arrays using .arange() to generate a vector of values ranging from start to stop (e.g. a = np.arange(10, 30))

  • Use enumerate or list comprehension or np.nonzero(array) to return all the indices of the non-zero elements

  • Generate random arrays using the Legacy version:

    from numpy import random
    np.random.random((3,3,3))

or using the new version:

from numpy.random import default_rng
rng = default_rng()
rng.random((3,3,3))
  • Generate arrays with equally spaced values wuth .linspace():

    x = np.linspace(start=0, stop=100, num=9)
  • Analyse the shape and dimensions of a ndarray, with .shape and .ndim

  • Slice and subset a ndarray based on its indices

  • Generate image arrays: noise = rng.random((128,128,3)) (128x128 pixels, and 3 values for R,G,B)

  • Do linear algebra like operations with scalars and matrix multiplication:

    • Linear algebra with vectors (1 dimension array) is fine: one can use + and * etc
    • Learn about Scalar and Broadcoasting, use NumPys broadcasting to make ndarray shapes compatible
    • Learn about Matrix Multiplication and the @ operator and np.matmul(a1, b1) function
  • Manipulate images in the form of ndarrays:
    • display img with matplotlib: plt.imgshow() also in grey: plt.imshow(img_grey, cmap='gray')
    • converting a colored img to grey using a formula:
      grey_vals = np.array([0.2126, 0.7152, 0.0722])
      sRGB_array = img / 255
      img_grey = sRGB_array @ grey_vals
    • using Image() from PIL library to open an img file as an array:
      from PIL import Image # for reading image files
      my_img = Image.open(file_name)
      img_array = np.array(my_img)
    • using methods such as .rot90(), .invert(), .flip() or 255 - img